interpretability technique
Interpretability-Aware Pruning for Efficient Medical Image Analysis
Malik, Nikita, Seth, Pratinav, Singh, Neeraj Kumar, Chitroda, Chintan, Sankarapu, Vinay Kumar
Deep learning has driven significant advances in medical image analysis, yet its adoption in clinical practice remains constrained by the large size and lack of transparency in modern models. Advances in interpretability techniques such as DL-Backtrace, Layer-wise Relevance Propagation, and Integrated Gradients make it possible to assess the contribution of individual components within neural networks trained on medical imaging tasks. In this work, we introduce an interpretability-guided pruning framework that reduces model complexity while preserving both predictive performance and transparency. By selectively retaining only the most relevant parts of each layer, our method enables targeted compression that maintains clinically meaningful representations. Experiments across multiple medical image classification benchmarks demonstrate that this approach achieves high compression rates with minimal loss in accuracy, paving the way for lightweight, interpretable models suited for real-world deployment in healthcare settings.
- Oceania > Australia > New South Wales > Sydney (0.04)
- Asia > India > Maharashtra > Mumbai (0.04)
Towards eliciting latent knowledge from LLMs with mechanistic interpretability
Cywiński, Bartosz, Ryd, Emil, Rajamanoharan, Senthooran, Nanda, Neel
As language models become more powerful and sophisticated, it is crucial that they remain trustworthy and reliable. There is concerning preliminary evidence that models may attempt to deceive or keep secrets from their operators. To explore the ability of current techniques to elicit such hidden knowledge, we train a Taboo model: a language model that describes a specific secret word without explicitly stating it. Importantly, the secret word is not presented to the model in its training data or prompt. We then investigate methods to uncover this secret. First, we evaluate non-interpretability (black-box) approaches. Subsequently, we develop largely automated strategies based on mechanistic interpretability techniques, including logit lens and sparse autoencoders. Evaluation shows that both approaches are effective in eliciting the secret word in our proof-of-concept setting. Our findings highlight the promise of these approaches for eliciting hidden knowledge and suggest several promising avenues for future work, including testing and refining these methods on more complex model organisms. This work aims to be a step towards addressing the crucial problem of eliciting secret knowledge from language models, thereby contributing to their safe and reliable deployment.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Poland > Masovia Province > Warsaw (0.04)
Investigating the Effectiveness of Explainability Methods in Parkinson's Detection from Speech
Mancini, Eleonora, Paissan, Francesco, Torroni, Paolo, Ravanelli, Mirco, Subakan, Cem
Speech impairments in Parkinson's disease (PD) provide significant early indicators for diagnosis. While models for speech-based PD detection have shown strong performance, their interpretability remains underexplored. This study systematically evaluates several explainability methods to identify PD-specific speech features, aiming to support the development of accurate, interpretable models for clinical decision-making in PD diagnosis and monitoring. Our methodology involves (i) obtaining attributions and saliency maps using mainstream interpretability techniques, (ii) quantitatively evaluating the faithfulness of these maps and their combinations obtained via union and intersection through a range of established metrics, and (iii) assessing the information conveyed by the saliency maps for PD detection from an auxiliary classifier. Our results reveal that, while explanations are aligned with the classifier, they often fail to provide valuable information for domain experts.
- South America > Colombia > Antioquia Department > Medellín (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- (2 more...)
- Health & Medicine > Therapeutic Area > Neurology > Parkinson's Disease (0.91)
- Health & Medicine > Therapeutic Area > Musculoskeletal (0.91)
Reframing the Brain Age Prediction Problem to a More Interpretable and Quantitative Approach
Gianchandani, Neha, Dibaji, Mahsa, Bento, Mariana, MacDonald, Ethan, Souza, Roberto
Deep learning models have achieved state-of-the-art results in estimating brain age, which is an important brain health biomarker, from magnetic resonance (MR) images. However, most of these models only provide a global age prediction, and rely on techniques, such as saliency maps to interpret their results. These saliency maps highlight regions in the input image that were significant for the model's predictions, but they are hard to be interpreted, and saliency map values are not directly comparable across different samples. In this work, we reframe the age prediction problem from MR images to an image-to-image regression problem where we estimate the brain age for each brain voxel in MR images. We compare voxel-wise age prediction models against global age prediction models and their corresponding saliency maps. The results indicate that voxel-wise age prediction models are more interpretable, since they provide spatial information about the brain aging process, and they benefit from being quantitative.
- North America > Canada > Alberta > Census Division No. 6 > Calgary Metropolitan Region > Calgary (0.15)
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- (2 more...)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (0.94)
Truthful Meta-Explanations for Local Interpretability of Machine Learning Models
Mollas, Ioannis, Bassiliades, Nick, Tsoumakas, Grigorios
Automated Machine Learning-based systems' integration into a wide range of tasks has expanded as a result of their performance and speed. Although there are numerous advantages to employing ML-based systems, if they are not interpretable, they should not be used in critical, high-risk applications where human lives are at risk. To address this issue, researchers and businesses have been focusing on finding ways to improve the interpretability of complex ML systems, and several such methods have been developed. Indeed, there are so many developed techniques that it is difficult for practitioners to choose the best among them for their applications, even when using evaluation metrics. As a result, the demand for a selection tool, a meta-explanation technique based on a high-quality evaluation metric, is apparent. In this paper, we present a local meta-explanation technique which builds on top of the truthfulness metric, which is a faithfulness-based metric. We demonstrate the effectiveness of both the technique and the metric by concretely defining all the concepts and through experimentation.
- Europe > Greece > Central Macedonia > Thessaloniki (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- (3 more...)
- Research Report > New Finding (0.93)
- Overview (0.93)
- Banking & Finance (1.00)
- Information Technology > Security & Privacy (0.93)
- Law (0.93)
- Health & Medicine > Therapeutic Area > Oncology (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
An Attention Matrix for Every Decision: Faithfulness-based Arbitration Among Multiple Attention-Based Interpretations of Transformers in Text Classification
Mylonas, Nikolaos, Mollas, Ioannis, Tsoumakas, Grigorios
Transformers are widely used in natural language processing, where they consistently achieve state-of-the-art performance. This is mainly due to their attention-based architecture, which allows them to model rich linguistic relations between (sub)words. However, transformers are difficult to interpret. Being able to provide reasoning for its decisions is an important property for a model in domains where human lives are affected. With transformers finding wide use in such fields, the need for interpretability techniques tailored to them arises. We propose a new technique that selects the most faithful attention-based interpretation among the several ones that can be obtained by combining different head, layer and matrix operations. In addition, two variations are introduced towards (i) reducing the computational complexity, thus being faster and friendlier to the environment, and (ii) enhancing the performance in multi-label data. We further propose a new faithfulness metric that is more suitable for transformer models and exhibits high correlation with the area under the precision-recall curve based on ground truth rationales. We validate the utility of our contributions with a series of quantitative and qualitative experiments on seven datasets.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Austria > Vienna (0.14)
- Europe > Greece > Central Macedonia > Thessaloniki (0.04)
- (8 more...)
- Health & Medicine (0.47)
- Law > Alternative Dispute Resolution (0.40)
Dealing with Various Cancers using Machine Learning part2(AI Health Care Series)
Abstract: he paper proposes a novel hybrid discovery Radiomics framework that simultaneously integrates temporal and spatial features extracted from non-thin chest Computed Tomography (CT) slices to predict Lung Adenocarcinoma (LUAC) malignancy with minimum expert involvement. Lung cancer is the leading cause of mortality from cancer worldwide and has various histologic types, among which LUAC has recently been the most prevalent. LUACs are classified as pre-invasive, minimally invasive, and invasive adenocarcinomas. Timely and accurate knowledge of the lung nodules malignancy leads to a proper treatment plan and reduces the risk of unnecessary or late surgeries. Currently, chest CT scan is the primary imaging modality to assess and predict the invasiveness of LUACs.
Exploration of Interpretability Techniques for Deep COVID-19 Classification using Chest X-ray Images
Chatterjee, Soumick, Saad, Fatima, Sarasaen, Chompunuch, Ghosh, Suhita, Krug, Valerie, Khatun, Rupali, Mishra, Rahul, Desai, Nirja, Radeva, Petia, Rose, Georg, Stober, Sebastian, Speck, Oliver, Nürnberger, Andreas
The outbreak of COVID-19 has shocked the entire world with its fairly rapid spread and has challenged different sectors. One of the most effective ways to limit its spread is the early and accurate diagnosis of infected patients. Medical imaging such as X-ray and Computed Tomography (CT) combined with the potential of Artificial Intelligence (AI) plays an essential role in supporting the medical staff in the diagnosis process. Thereby, five different deep learning models (ResNet18, ResNet34, InceptionV3, InceptionResNetV2, and DenseNet161) and their Ensemble have been used in this paper to classify COVID-19, pneumoni{\ae} and healthy subjects using Chest X-Ray images. Multi-label classification was performed to predict multiple pathologies for each patient, if present. Foremost, the interpretability of each of the networks was thoroughly studied using local interpretability methods - occlusion, saliency, input X gradient, guided backpropagation, integrated gradients, and DeepLIFT, and using a global technique - neuron activation profiles. The mean Micro-F1 score of the models for COVID-19 classifications ranges from 0.66 to 0.875, and is 0.89 for the Ensemble of the network models. The qualitative results depicted the ResNets to be the most interpretable models. This research demonstrates the importance of using interpretability methods to compare different models before making the decision regarding the best-performing model.
- Europe > Germany > Saxony-Anhalt > Magdeburg (0.05)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Asia > China > Hubei Province > Wuhan (0.04)
- (2 more...)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
- Health & Medicine > Therapeutic Area > Immunology (1.00)
- Health & Medicine > Epidemiology (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
An Interactive Interpretability System for Breast Cancer Screening with Deep Learning
Deep learning methods, in particular convolutional neural networks, have emerged as a powerful tool in medical image computing tasks. While these complex models provide excellent performance, their black-box nature may hinder real-world adoption in high-stakes decision-making. In this paper, we propose an interactive system to take advantage of state-of-the-art interpretability techniques to assist radiologists with breast cancer screening. Our system integrates a deep learning model into the radiologists' workflow and provides novel interactions to promote understanding of the model's decision-making process. Moreover, we demonstrate that our system can take advantage of user interactions progressively to provide finer-grained explainability reports with little labeling overhead. Due to the generic nature of the adopted interpretability technique, our system is domain-agnostic and can be used for many different medical image computing tasks, presenting a novel perspective on how we can leverage visual analytics to transform originally static interpretability techniques to augment human decision making and promote the adoption of medical AI.
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (0.72)
Explain Yourself - A Primer on ML Interpretability & Explainability
The project to define what the late Marvin Minsky refers to as a suitcase word -- words that have so much packed inside them, making it difficult for us to unpack and understand this embedded intricacy in its entirety -- has not been without its fair share of challenges. The term does not have a single agreed-upon definition, with the dimensions of description shifting from optimization or efficient search space exploration to rationality and the ability to adapt to uncertain environments, depending on which expert you ask. The confusion becomes more salient when one hears news of machines achieving super-human performance in activities like Chess or Go -- traditional stand-ins for high intellectual aptitude -- but fail miserably in tasks like grabbing objects or moving across uneven terrain, which most of us do without thinking. But, several themes do emerge when we try to corner the concept. Our ability to explain why we do what we do makes a fair number of appearances in the list of definitions proposed by multiple disciplines.